PART 1: Distributions and samples
So sampling is our only rescue.
Population parameters (in cm):
\(\mu = 175\) and \(\sigma = 7\)
Our full population of 10m adults:
| id | height |
|---|---|
| 1 | 174 |
| 2 | 198 |
| … | … |
| 9999999 | 156 |
| 10000000 | 180 |
We are the researchers now:
We do not know all this about the population!!!
We want to know how tall Dutch adults are.
Let’s start…
\(n=3\)
| id | height |
|---|---|
| 693610 | 158.61 |
| 8177752 | 181.76 |
| 9426218 | 172.30 |
\(M = \frac{\sum{{X}}}{n} = \frac{158.61+181.76+172.30}{3} = \frac{512.67}{3} = 170.89\)
\(SD = \sqrt{\frac{SS}{n-1}}\)
\(SS =\sum{{(X-M)^2}} = (158.61-170.89)^2+... = 270.94\)
\(SD = \sqrt{\frac{270.94}{3-1}} =\sqrt{135.47} =11.64\)
When we obtain statistics from our data, we talk about:
The sampling error is the difference between the two.
Here:
We repeat the sampling process: now we take a sample of \(n=3\) twice.
| sample | n | mean_height |
|---|---|---|
| 1 | 3 | 170.89 |
| 2 | 3 | 175.21 |
Repeated sampling: 10 times a sample of \(n=3\)
| sample | n | mean_height |
|---|---|---|
| 1 | 3 | 170.89 |
| 2 | 3 | 175.21 |
| 3 | 3 | 175.40 |
| 4 | 3 | 172.42 |
| 5 | 3 | 177.65 |
| 6 | 3 | 180.61 |
| 7 | 3 | 169.51 |
| 8 | 3 | 179.00 |
| 9 | 3 | 179.69 |
| 10 | 3 | 174.82 |
Why don’t we also increase sample size \(n\)?
| sample | n | mean_height |
|---|---|---|
| 1 | 20 | 175.17 |
| 2 | 20 | 174.15 |
| 3 | 20 | 178.14 |
| 4 | 20 | 173.99 |
| 5 | 20 | 173.45 |
| 6 | 20 | 177.54 |
| 7 | 20 | 175.86 |
| 8 | 20 | 172.75 |
| 9 | 20 | 176.58 |
| 10 | 20 | 175.04 |
We now have sampled 10 times with \(n=20\).
The mean of the 10 means is:
## [1] 175.27
Think back to what we did?
We have increased the number of samples and the sample size to reduce the sampling error
Remember: we want to estimate the mean \(M\) (we do not ever have access to the population).
Do we thus need to take many, many samples with big sample sizes?
Luckily, there is a mathematical theorem to our rescue!
The central limit theorem states that:
And: it will approach the normal distribution with increasing \(n\)
This is like a life saver.
Shape:
The distribution of sample means approaches the normal distribution if:
Central tendency (mean):
But we do not always have all possible samples (actually: never!).
So we know that \(\mu \approx M\). Thus we need some kind of “variability indicator” for the mean (of the sample means)…
Variability of the mean: the standard error of the mean
\(SE = \sigma_M = \frac{\sigma}{\sqrt{n}}\)
This can also be written as: \(SE = \sqrt{\frac{\sigma^2}{n}}\)
We take a sample of \(n=1\) from our height data and get:
## [1] 170.77
The standard error here is \(SE = \frac{\sigma}{\sqrt{n}} = \frac{7}{\sqrt{1}} = 7\)
With \(n=1\), \(SE = \sigma\).
Remember, our population had \(\mu=175\) and \(\sigma=7\).
| n | SE |
|---|---|
| 1 | 7.00 |
| 2 | 4.95 |
| 3 | 4.04 |
| 4 | 3.50 |
| 5 | 3.13 |
| 10 | 2.21 |
| 100 | 0.70 |
| 1000 | 0.22 |
If we know:
Then we can use the CLT to find the shape, mean and standard deviation (standard error) of the distribution of sampling means!
Our height data with \(\mu=175\) and \(\sigma=7\).
We take a sample of \(n=60\).
So, the distribution of sample means has:
Given our height data with \(\mu=175\) and \(\sigma=7\):
We take a sample of \(n=100\). What is is the probability that the mean height of that sample is 177cm or higher?
Information about the distribution of sample means step-wise:
Obtain z-score:
Locate area of interest:
Translate proportions to probabilities:
The probability of the sample of \(n=100\) having a mean of 177 or higher is 0.0021 (0.21%)
We can calculate two kinds of z-scores:
Note: in hypothesis testing, we are mostly interested in sample mean comparisons!
Sampling Error = population parameters - sample statistics:
In other words: with increasing \(n\), we decrease the standard error \(SE\), and thereby reduce the sampling error!
PART 2: Hypothesis testing
We want to test a hypothesis about a population.
Because we cannot ever have access to the whole population, we need to work with a sample.
i.e. we are interested in making an inference (we are now entering inferential statistics territory) about a population from a sample
Extra lessons for the intro to statistics exam.
Let’s walk through this step-by-step.
Suppose the “intro to stats” exam grade \(X\) form this distribution \(X \sim N(6.9, 1.1)\).
You are now testing whether extra lessons has an effect on the exam grade.
So you are formulating a hypothesis as follows:
The null hypothesis:
The alternative hypothesis:
You are now testing whether extra lessons has an effect on the exam grade.
You have access to a sample of \(n=49\) students who took extra lessons.
NHST = null hypothesis significance testing
We know that the distr. of sample means under \(H_0\) with \(n=49\) has a mean of \(\mu=6.9\) and a standard deviation of \(\sigma_m = \frac{\sigma}{\sqrt{n}} = \frac{1.10}{7} = 0.16\)
NHST = null hypothesis significance testing
If the observed sample mean (from our \(n=49\) sample with extra lessons) is very unlikely under the expected data, we would reject the null.
This is why it is called null hypothesis significance testing.
But what does very unlikely mean?
In NHST, very unlikely is translated to statistically significantly different.
Also called: the alpha level.
e.g. an alpha level of \(\alpha = 0.01\) means that we deem a value unlikely (or statistically significantly different) if the probability of observing it is smaller than \(\alpha\).
Remember: we know this probability stuff and the idea of “unlikely”!
The alpha level corresponds exactly to regions on the distribution.
More specifically:
We can locate the z-scores that correspond to tail proportions (and hence: probabilities).
Important:
If we have a higher than or lower than \(H_1\), then we call this a directional hypothesis.
Example:
\(\alpha = 0.05\) and a directional \(H_1\) needs a z-score that has a tail prob. of 0.05.
Important:
If we have a different than \(H_1\), then we call this a non-directional hypothesis (i.e. we simply state that it is different than what we expect under the null but have no idea in which direction).
Example:
\(\alpha = 0.05\) and a directional \(H_1\) needs a z-score that has a tail prob. of 0.025 (because it spreads to both tails!!!).
Directional alternative hypotheses:
Non-directional alternative hypotheses:
Since we have a directional \(H_1\) that states \(H_1: \mu > 6.9\), we load all unlikeliness to the right tail.
We have analysed the data of our \(n=49\) sample:
The sample mean is \(M=7.46\)
We obtain the z-score for the sample mean (see p. 210 in the book).
\(z=\frac{M-\mu}{\sigma_M} = \frac{7.46-6.90}{0.16} = \frac{0.56}{0.16} = 3.5\)
Thus:
Since it is a directional \(H_1\), we look at the tail prob. for \(z=3.5\).
| z | body | tail | M-to-z |
|---|---|---|---|
| 3.50 | .9998 | .0002 | .4998 |
| z | body | tail | M-to-z |
|---|---|---|---|
| 3.50 | .9998 | .0002 | .4998 |
Observing a mean of \(M=7.46\) or higher has a probability of 0.0002 (or 0.02%) under the null hypothesis.
This is lower than our pre-defined threshold of \(\alpha = 0.01\):
We therefore reject the null hypothesis.
Our data support the alternative hypothesis that extra lessons did improve the grade.
| z | body | tail | M-to-z |
|---|---|---|---|
| 3.50 | .9998 | .0002 | .4998 |
Observing a mean of \(M=7.46\) or higher has a probability of 0.0002 (or 0.02%) under the null hypothesis.
0.0002 is the p-value!
Written as \(p=.0002\)
PART 3: Errors in inference
Remember:
Two kinds of errors: Type 1 errors and Type 2 errors
Analogy: false positives
We conclude there is a difference (an effect), but it’s a false alarm (in reality there is no effect).
In hypothesis terms: we reject the null but shouldn’t have done so.
We want to keep that error low.
i.e. we want to be quite sure that there is an effect.
This is all contained in the alpha level: under the null, a proportion of exactly \(\alpha\) lies in the critical region.
For \(\alpha=0.01\), 1% of the values under the null lie is that area.
Thus: in 1% of the cases, will we incorrectly conclude that there is an effect.
Analogy: missed effects.
We conclude that there is no difference, but in reality there is one (i.e. we miss the effect).
In hypothesis terms: we fail to reject the null hypothesis although we should have done so.
This error term is called \(\beta\).
More on this in the week on statistical power